METROID MUSIC FORMAT v1.0 By Justin Olbrantz (Quantam) The permanent home of this specification and others, with the latest updates, is https://github.com/TheRealQuantam/RetroDocs. This specification describes the music format of the "Metroid music engine" used by Metroid as well as at least 2 other games: Gumshoe and Kid Icarus, all by Nintendo Research and Development 1. The Metroid engine is 1 of a series of incompatible versions of the Hirokazu Tanaka engine used mostly by R&D1 but also a couple other games he composed for. The Tanaka engine is derived from the very first Nintendo NES engine believed to have been written by Yukio Kaneoka which would serve as the progenitor of most of Nintendo's early-mid engines, including those by Tanaka, Koji Kondo, and Akito Nakatsuka. Later versions of the Tanaka engine would be used most notably by Mother and Tetris. The Metroid sound engine is a very simple and relatively inflexible engine, though not quite to the degree of earlier R&D4 games such as The Legend of Zelda. It is a hybrid engine in which some parameters are specified in the music data while others are hard-coded in the engine. Unlike more flexible formats, timbre is split between the track header (for per-track specification) and hard-coding (an alternate but far more rigid type of per-track specification), and channel data is limited to notes, rests, and loops. Basic Features: - Single byte notes and rests, though current note length must be previously defined with an additional byte + Hybrid tempo system that mimics staff notation divisive rhythm - 5-octave chromatic scale from A1 (55 Hz) to F7 (2794 Hz) for square wave (triangle is 1 octave lower). - 9 note lengths from whole notes through 16th notes with dots and triplets for some lengths for all tempos, plus several tempo-specific lengths - 3 tempos ranging from 128.6 to 225 BPM - 4 channels - square 1, square 2, triangle, noise - each with separate event sequences + Limited timbre support - Hard-coded per-track duty cycle and volume selection for square wave channels - Per-track selection of non-looping envelopes for square wave channels - Per-track triangle wave channel auto-release settings - Global set of 3 noise presets - Single-level unnested loops - Whole track looping only supports looping to the beginning of the track - Limit of 256 bytes of data per track channel Some general notes in interpreting this specification: - All word (16-bit) values in the format are little-endian, with the least-significant byte preceding the most-significant, unless noted otherwise. - All values are unsigned unless noted otherwise. - All numbers preceded by $ are hex, and in general unless they represent byte strings or addresses most numbers without $ are in decimal. - Byte values are generally shown as hex "ab" where "a" is the high nibble and "b" is the low nibble (standard mathematical digit ordering). However, when it is necessary to show bit fields, bytes are shown in binary as "abcd efgh" (mathematical order); in bit fields "-" means that the value of the bit does not matter in the given context. - Many command lists (e.g. channel data commands) are encoded ambiguously, where 1 value could be interpreted as multiple different commands (e.g. 00 could match 0n). These lists are ordered such that the first matching command is the correct interpretation. Track Structure Metroid has a rather unusual structure compared to other games. Rather than consolidate all its music code and data into a single bank and swap to and from that bank when it comes time to update the music each frame, Metroid consolidates all its code and data for an area into a single bank and never switches banks, duplicating code and data needed by multiple areas but not able to fit in the common bank. All of its music code and non-track data is duplicated across all banks containing level data, as are a couple tracks, while other tracks are bank-specific. Fortunately, all cross-bank music code and data is at the same address in each bank (in the $a000-$bfff region). To convert addresses to file offsets: file offset = bank * $4000 + address - $7ff0 The $8000-$bfff banks used for different parts of the game: 0: Title screen and ending 1: Brinstar 2: Norfair 3: Tourian 4: Kraid's lair 5: Ridley's lair Metroid has 12 music tracks, the details of which can be found in Additional Tables. Most importantly, the track header offset table is located at $bbfa. This array of bytes specifies the offset of the track headers within the track header block at $bd31. Note that while the order of tracks is universal, track headers potentially differ between banks and track headers are only guaranteed to contain the correct data for tracks that exist in the bank (sometimes headers have obviously invalid channel data addresses for tracks that reside in a different bank). Track Header Format: +0 byte: 000t tttt: Note length table base offset t. Implicitly defines the tempo. 0: 225 BPM b: 150 BPM 17: ~128.6 BPM (900/7 BPM) +1 byte: At the end of the track: if non-0 (typically $ff) restart the track from the beginning, else stop. +2 byte: FFFF LLLL: Triangle channel note auto-release control. Do the action of the first matching condition: L != 0: Set auto-release to L/4 frames F != 0: Disable auto-release 0: Dynamic auto-release: silence note 1 frame before the note's end or after 15 frames, whichever is sooner +3 byte[2]: Square channel (1 then 2) envelope numbers 1-5 or 0 if none +5 word[4]: Channel data start addresses in order of square 1, square 2, triangle, noise, or 0 if none Example: 0B FF 00 02 03 00 B0 57 B0 C1 B0 2B B1 0B: Note length table base offset $b (150 BPM) FF: Loop track when it reaches the end 00: Set dynamic triangle auto-release 02: Square channel 1 envelope number 2 03: Square channel 2 envelope number 3 00 B0: Square channel 1 address $b000 57 B0: Square channel 2 address $b057 C1 B0: Triangle channel address $b0c1 2B B1: Noise channel address $b12b Time and Tempo Note/rest durations in the Metroid engine are frame-based. A master table of somewhat indistinct size (but at least $22 entries) specifies note durations in frames. Each track selects a $10-element window into this table for use by the entire track, and set note length commands select 1 of these lengths. This table is arranged (and used in Metroid music) to provide 3 different tempos with a common set of 9 musical lengths with common indices; various other lengths are available based on the window used for the track, but they are not common between tempos. The master table giving lengths in frames is: 0 1 2 3 4 5 6 7 8 9 a b c d e f 0: 4, 8, 16, 32, 64, 24, 48, 12, 11, 5, 2, 6, 12, 24, 48, 96 $10: 36, 72, 18, 16, 8, 3, 16, 7, 14, 28, 56,112, 42, 84, 21, 18 $20: 2, 3, 32,252,179,173, 77, 6 Metroid tracks use the base indices 0 for 225 BPM, $b for 150 BPM, and $17 for 128.6 BPM. Within these windows only lengths 0-$a are ever used, implying the intended table length is $b entries (though all $10 are listed below). This produces the following length tables: 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 (225 BPM): 4, 8, 16, 32, 64, 24, 48, 12, 11, 5, 2, 6, 12, 24, 48, 96 b (150 BPM): 6, 12, 24, 48, 96, 36, 72, 18, 16, 8, 3, 16, 7, 14, 28, 56 17 (128.6 BPM): 7, 14, 28, 56, 112, 42, 84, 21, 18, 2, 3, 32,252,179,173, 77 Using these windows, the following standard length codes are defined: Whole Half 4th 8th 16th Normal: 4 3 2 1 0 Dotted: 6 5 7 Triplet: 8* * Index 8 in all 3 tempos corresponds to a triplet quarter note, though it is rounded (inconsistently, at that) in all but 150 BPM. No apparent correspondence exists among any other indices across all tempos. Notes on the triangle wave channel are capable of auto-release prior to the end of the specified note duration. Depending on the auto-release setting in the track header, notes may be set to either never auto-release, auto-release after a fixed number of frames (up to 4), or dynamic auto-release of notes 1 frame before the end of the note or after 15 frames, whichever is sooner. Notes and Rests Metroid uses 2 different key sets, 1 for melodic channels (though triangle channel is 1 octave lower than the square channels), and 1 set of presets for the noise channel. The 2 do have 1 thing in common, however: while the ways key/preset numbers are encoded differs between the 2, both use key code 1 as rest. Metroid has a peculiar melodic key set, as the set is unusually cramped even in comparison to other games using the same engine (although Gumshoe's key numbers are stranger still); all other games have more than $40 keys, and are not missing keys from octave 2. The Metroid engine has a maximum of $57 key codes (+ rest), and even higher is possible using an exploit; and if relocated the table could be expanded, though this is not entirely trivial and would need to be done for all banks. Key Table: Oct C C# D D# E F F# G G# A A# B Max Detune at B 1: 0* 0.96 cents 2: 2 3 4 5 6 7 8 9 a b 1.91 cents 3: c d e f 10 11 12 13 14 15 16 17 3.83 cents 4: 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 7.69 cents 5: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 15.49 cents 6: 30 31 32 33 34 35 36 37 38 39 3a 3b 31.41 cents 7: 3c 3d 3e 3f * Key code 0 is an exploit: if key code 0 IMMEDIATELY follows a set length command it will be assumed to be a note/rest rather than being interpreted as an end of track command. While not relevant to the Metroid set of only $40 keys, this same exploit can be used with keys above $57. The octaves listed in the table are for square wave channels, and the triangle channel will be 1 octave lower for each key code. As well, as can be seen from the table, Metroid has a nominal base key of C2, but codes 0-3 are irregular. Noise codes are peculiar. Rather than being preset numbers, they are 1-based byte-offsets into an array of noise presets, each entry being 3 bytes. Metroid has the following presets (and, as previously stated, code 1 is rest): Noise Key Table: 4: Snare tap 7: Snare full stroke a: Snare rim Channel Data Channel data is quite basic, having just enough commands to be considered a minimally competent engine: notes, rests, set note length, loops, and end of track. Keys and rests are encoded slightly differently between melodic and noise channels, but channel data is otherwise identical. 1 major difference between the Metroid music format and non-Nintendo engines is that while each channel has separate music data, the end of track command applies globally, and all channels will stop (or loop, depending on the track header) when ANY channel encounters the end of track command. Because of this, it is not necessary to explicitly terminate all channels, only the 1 that will reach its end first. Channel Data Format: 00: End of track. Either stop or restart the track when any channel encounters this. ff: End loop. If loop counter > 0 decrement and loop, else continue cx (11nn nnnn): Begin loop and set loop counter to n - 1 (n is the number of total plays 1-$3e or 0 is interpreted as $100) bL: Set current note length to index L; for triangle channel auto-release is set as described in the auto-release entry of the track header. Due to a quirk of the engine, the set length command must be IMMEDIATELY followed by a note or rest. Melodic channels: 02: Rest for current note length kk (0kkk kkk0): Play melodic key code k for current note length Noise channel: 01: Rest for current note length 0p: Play noise preset code p (4, 7, or $a) for current note length As previously stated, any byte immediately following a set length command is interpreted as a note/rest, allowing access to keys normally inaccessible; however, because of the extremely limited key set in Metroid this only (additionally) allows access to key code 0. Examples Triangle Channel (tempo is 128.6 BPM, auto-release is set to dynamic): CA: Begin loop n = $a: Set loop counter to 9 (play 10 times in total) B0: Set note length L = 0: 16th note (7 frames). Notes will auto-release after 6 frames. 2A x3: Notes k = $15: Key A2 02 x2: Rests FF: End loop B2: Set note length L = 2: Quarter note (28 frames). Notes will auto-release after 15 frames (~8th note). 34 x2: Notes k = $1a: Key D3 Triangle Channel (tempo is 150 BPM, auto-release is set to 5/4 frames) C0: Set note length L = 0: 16th note (6 frames). Notes will auto-release after 5/4 frames. 38: Note k = $1c: Key E3 3A: Note k = $1d: Key F3 Timbre Metroid has limited support for different timbres in the square wave channels in 2 parts: default ctrl 1 register ($4000, $4004) settings, and non-looping envelopes. Envelopes are specified on a per-track basis in the track's header, and each value in the envelope is applied for 1 frame. Both mechanisms revolve around the ctrl 1 registers, which have the format ddLV vvvv: d: Duty cycle. 0: 12.5%, 1: 25%, 2: 50%, 3: 75% L: Disable length counter and play note forever. This is mostly irrelevant since the length counter in Metroid is set to 127 frames, which is longer than the longest length in the note length table. V: Specifies that v is the volume and not decay rate. Always set. v: The volume Envelope Format: f0: End envelope and silence channel ff: End envelope and return ctrl 1 to track default, or $10 if default is 0 vv: Set ctrl 1 to v | (track default & $f0) And the 5 Metroid envelopes, showing 1 character for the volume for each envelope entry except the terminator: 1 @bcba (3cca): 1223345678 ff 2 @bcc5 (3cd5): 245678765 ff 3 @bccf (3cdf): 0d97655544 ff 4 @bcda (3cea): 2677766665554443333233333222222222211111 f0 5 @bd03 (3d13): aa9876543277654432225554322211443212211122211 f0 The intent of envelopes is to specify the volume via envelope while the d, L, and V fields are specified by the track default. However, this is not enforced and the entire value of v is ORed with the high nibble of the channel default. If the channel default leaves, say, the d bits 0, the envelope values are free to set them on a per-frame basis. The potential problem with this, however, is that if the envelope ends in $ff to return the ctrl 1 register to its default value, that will set the d field to 0 (12.5% duty cycle). As such this method is only applicable when this result is desired or when the end of the envelope silences the channel ($f0). As for the track defaults themselves, they are... more difficult. They are in fact hard-coded by the engine. But there is no table of values; that would be far too easy and obvious. Rather, there are 2 tables - 1 for tracks 0-7 and the other for tracks 8-$b - that contain the addresses of code snippets which are called to SET the ctrl 1 values directly. This utterly obtuse system is explained in detail in a later section. But you'd be far better off just using my Metroid Music Table Patch (available from my retro docs URL, among other places), which replaces the system with a simple table of track defaults that's easy to modify. The details of how this patch works are also described in the aforementioned section. Additional Tables and Information Reminder: with the exception of certain track data and their corresponding headers, all music code and data is duplicated in banks 0-5, and any changes will need to be, as well. Track Table # Name Bks Off TO Lp TriL Sq1T Sq2T Sq1A Sq2A TriA NoiA 0 (0) Ridley's Lair @4/5:+41: b ff f:0 b3:1 b3:1 b022 b031 b000 0 1 (1) Tourian @all:+8f: b ff 0:3 92:0 96:0 be59 be47 be62 0 2 (2) Item Room @all:+34: b ff 0:3 92:0 96:0 bdda bddc bdcd 0 3 (3) Kraid's Lair @4/5:+27: 0 ff f:0 92:0 96:0 b03f b041 b0aa 0 4 (4) Norfair @ 2:+1a: b ff f:0 34:4 34:4 b000 b026 b057 b08b 5 (5) Escape @ 3: +d: b ff 0:0 f4:2 f4:2 b04d b000 b0cf b15a 6 (6) Mother Brain @ 3: +0: b ff f:5 92:0 96:0 b18c b18e b161 0 7 (7) Brinstar @ 1:+82: b ff 0:0 34:2 34:3 b000 b057 b0c1 b12b 8 (8) Samus Appears @all:+68: b 0 f:0 34:2 34:0 be3e be1d be36 0 9 (9) Item Fanfare @all:+75: 0 0 f:0 b6:1 f6:0 bdf7 be0d be08 0 10 (a) Ending @ 0:+4e: 17 0 0:0 f5:2 f6:1 ac00 adc5 acf5 ae8e 11 (b) Title Theme @ 0:+5b: 17 0 f:0 b6:2 f6:5 b0b9 b000 b076 b115 Table legend: Bks: Bank(s) the track exists in Off: Offset of the track header in the track header block TO: Note length table base offset (tempo) Lp: Loop setting - 0 if non-looping, non-0 if looping TriL: Triangle auto-release control (F:L) Sq1T: Square 1 channel timbre settings. The left side is the ctrl 1 value, the right side is the envelope number if any. Sq2T: Square 2 channel timbre settings. Sq1A/Sq2A/TriA/NoiA: Addresses of channel data if any bbfa byte[$c]: Track header offsets table. Offsets are relative to the track header block. bd31 structs: Track header block base Code to load a track header: BF26 LDA $BBFA,Y BF29 TAY BF2A LDX #$00 BF2C LDA $BD31,Y BF2F STA $062B,X BF32 INY BF33 INX BF34 TXA BF35 CMP #$0D BF37 BNE $BF2C bef7 byte[$22?]: Note length table in frames Code to load the current length for notes: BB2B LDA $BEF7,Y Code to limit the triangle auto-release length to 15 frames ($3c quarter-frames) when in dynamic auto-release mode: BBD2 CMP #$3C BBD4 BCC $BBD8 BBD6 LDA #$3C be77 big-endian word [$40]: Key pitch table, in period register units. For square wave channels, period = 1789773 / frequency / 16 - 1 (triangle channel is 1 octave lower). Entry 1 must be 0 as it is rest. Code to load the period for a key: BB49 LDA $BE78,Y BB4C BEQ $BB59 BB4E STA $0600,X BB51 LDA $BE77,Y BB54 ORA #$08 BB56 STA $0601,X b200 struct[4]: Noise presets table. Data actually starts at offset 1 ($b201). The first entry must be silence (rest). +0 byte: Ctrl 1 value +1 word: Period regs value Code to play a noise preset: BBE5 LDA $B200,Y BBE8 STA NoiseVolume_400C BBEB LDA $B201,Y BBEE STA NoisePeriod_400E BBF1 LDA $B202,Y BBF4 STA NoiseLength_400F bcb0 word[5]: Envelope addresses (0-based, unlike in track headers) Code to load an envelope address: BA5C LDA $BCB0,Y BA5F STA $EC BA61 LDA $BCB1,Y BA64 STA $ED Changing the Track Ctrl 1 Values 1 section of the begin track process selects the ctrl 1 values for the track by looking up the track number in a jump table and then jumping to the corresponding address. These addresses point to short code snippets that assign hard-coded ctrl 1 values and then jump back to a common point to continue the start process. It is difficult to conceive of any circumstance in which this was a good idea. bc26 word[8]: Jump table for tracks 0-7 bc06 word[4]: Jump table for tracks 8-$b 6 possible jump table targets are defined in Metroid. For some reason the addresses in the table actually point to jump stubs which jump to a second address where the values are actually set. The following shows the addresses used in the table (the jump stubs), the actual addresses of the code which does the deed, and the ctrl 1 values set. Table Actual S1 S2 bc77: bcaa 92 96 bc7a: bca4 b6 f6 bc7d: bc9a f4 f4 bc80: bc96 34 34 bc83: bc89 b3 b3 bc86: bc9e f5 f6 Switching tracks between 1 of these predefined addresses is not too difficult, and requires only changing the corresponding address in the jump tables. The next step in difficulty comes from modifying the values themselves. These code snippets take 2 forms, depending on whether both channels' ctrl 1 values are the same or different. Example code to assign both channels the same value: BC9A LDA #$F4 BC9C BNE $BC8B The A register is assigned the value to be used by both, and $bc8b will copy it. Example code to assign different values to each channel: BCA4 LDX #$B6 BCA6 LDY #$F6 BCA8 BNE $BC8D Here X is assigned the square 1 channel value, and Y the square 2 channel value. $bc8d then applies these values. The problem with this is that there are a limited number of possibilities to work with: 3 same-value options and 3 2-value options. If you're lucky, it will be sufficient to exploit the fact that not all tracks are present in all banks to make bank-specific changes that only affect the tracks you need to modify. If you're less lucky you have to actually find space to put additional snippets. Or, better yet, you might just solve the problem entirely. Here is the full code for this mechanism, including the jump stubs, snippets, and post-snippet code: BC77 JMP $BCAA <- jump stubs ($12 bytes) BC7A JMP $BCA4 BC7D JMP $BC9A BC80 JMP $BC96 BC83 JMP $BC89 BC86 JMP $BC9E BC89 LDA #$B3 <- 1 of the code snippets (2 bytes) BC8B TAX <- code to copy A to X and Y for same value snippets (2 bytes) BC8C TAY BC8D JSR $B9E4 <- common return point for all snippets ($b bytes) BC90 JSR $BF19 BC93 JMP $BAA5 BC96 LDA #$34 <- most of the snippets ($1a bytes) BC98 BNE $BC8B BC9A LDA #$F4 BC9C BNE $BC8B BC9E LDX #$F5 BCA0 LDY #$F6 BCA2 BNE $BC8D BCA4 LDX #$B6 BCA6 LDY #$F6 BCA8 BNE $BC8D BCAA LDX #$92 BCAC LDY #$96 BCAE BNE $BC8D So we have a total of $2e bytes to play with between the jump stubs and the code snippets. However, if the snippets are to be replaced by a table of values, the 2 bytes to copy from A to X and Y are also unnecessary, bringing the total to $30 bytes. To change to a table-based system, we'd make all the entries in the jump table point to a single handler which would load the values from a lookup table. The lookup table itself would have to be $c * 2 = $18 bytes. That leaves $18 bytes for the code, which is quite luxurious. Helpfully, the code snippets already clobber A, X, and Y, so we don't even need to preserve them. 1 key piece of information not shown in the above code is that these jump stubs are reached with the track number (0-$b) stored at memory address $65e. The full code thus looks like this: BC77 LDA $065E BC7A ASL A BC7B TAY BC7C LDX $BC96,Y BC7F LDA $BC97,Y BC82 TAY BC83 JMP $BC8D Then at $bc96 we inject the table: B3 B3 92 96 92 96 92 96 34 34 F4 F4 92 96 34 34 34 34 B6 F6 F5 F6 B6 F6 And of course the jump tables then have all values set to $bc77 (77 BC).